LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval
نویسندگان
چکیده
In the area of information retrieval, the dimension of document vectors plays an important role. Firstly, with higher dimensions index structures suffer the “curse of dimensionality” and their efficiency rapidly decreases. Secondly, we may not use exact words when looking for a document, thus we miss some relevant documents. LSI (Latent Semantic Indexing) is a numerical method, which discovers latent semantic in documents by creating concepts from existing terms. However, it is hard to compute LSI. In this article, we offer a replacement of LSI with a projection matrix created fromWordNet hierarchy and compare it with LSI.
منابع مشابه
Using BFA with wordnet ontology based model for web retrieval
In the area of information retrieval, the dimension of document vectors plays an important role. We may need to find a few words or concepts, which characterize the document based on its contents, to overcome the problem of the "curse of dimensionality", which makes indexing of highdimensional data problematic. To do so, we earlier proposed a Wordnet and Wordnet+LSI (Latent Semantic Indexing) b...
متن کاملExplicit vs. Latent Concept Models for Cross-Language Information Retrieval
The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such ...
متن کاملLower Dimensional Representation of Text Data in Vector
Dimension reduction in today's vector space based information retrieval system is essential for improving computational eeciency in handling massive data. In this paper, we propose a mathematical framework for lower dimensional representation of text data in vector space based information retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent ...
متن کاملExplicit Versus Latent Concept Models for Cross-Language Information Retrieval
The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such ...
متن کاملPublic Transport Ontology for Passenger Information Retrieval
Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...
متن کامل